AITopics | principal direction

Collaborating Authors

principal direction

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

PRIM-cipal components analysis

Liu, Tianhao, Díaz-Pachón, Daniel Andrés, Rao, J. Sunil

arXiv.org Machine LearningApr-20-2026

EVEN supervised learning is subject to the famous NoFree Lunch Theorems [1]-[3], which say that, in combinatorial optimization, there is no universal algorithm that works better than its competitors for every objective function [4]-[6]. Indeed, David Wolpert has recently proven that, on average, cross-validation performs as well as anti-crossvalidation (choosing among a set of candidate algorithms based on which has the worst out-of-sample behavior) for supervised learning. Still, he acknowledges that "it is hard to imagine any scientist who would not prefer to use [crossvalidation] to using anti-cross-validation" [7]. On the other hand, unsupervised learning has seldom been studied from the perspective of the NFLTs. This may be because the adjective "unsupervised" suggests that no human input is needed, which is misleading as many unsupervised tasks are combinatorial optimization problems that depend on the choice of the objective function. For instance, it is well known that, among the eigenvectors of the covariance matrix, Principal Components Analysis selects those with the largest variances [8]. However, mode-hunting techniques that rely on spectral manipulation aim at the opposite objective: selecting the eigenvectors of the covariance matrix with the smallest variances [9], [10]. Therefore, unlike in supervised learning, where it is difficult to identify reasons to optimize with respect to anti-cross-validation, in unsupervised learning there are strong reasons to reduce dimensionality for variance minimization. D. A. D ıaz-Pach on and T. Liu are with the Division of Biostatistics, University of Miami, Miami, FL, 33136 USA (e-mail: ddiaz3@miami.edu,

artificial intelligence, machine learning, principal component, (18 more...)

arXiv.org Machine Learning

2604.15538

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.28)
North America > United States > Florida > Miami-Dade County > Miami (0.24)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
(4 more...)

Genre: Research Report (0.40)

Industry: Health & Medicine (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.66)

Add feedback

The Universal Weight Subspace Hypothesis

Kaushik, Prakhar, Chaudhari, Shravan, Vaidya, Ankit, Chellappa, Rama, Yuille, Alan

arXiv.org Artificial IntelligenceDec-9-2025

We show that deep neural networks trained across diverse tasks exhibit remarkably similar low-dimensional parametric subspaces. We provide the first large-scale empirical evidence that demonstrates that neural networks systematically converge to shared spectral subspaces regardless of initialization, task, or domain. Through mode-wise spectral analysis of over 1100 models - including 500 Mistral-7B LoRAs, 500 Vision Transformers, and 50 LLaMA-8B models - we identify universal subspaces capturing majority variance in just a few principal directions. By applying spectral decomposition techniques to the weight matrices of various architectures trained on a wide range of tasks and datasets, we identify sparse, joint subspaces that are consistently exploited, within shared architectures across diverse tasks and datasets. Our findings offer new insights into the intrinsic organization of information within deep networks and raise important questions about the possibility of discovering these universal subspaces without the need for extensive data and computational resources. Furthermore, this inherent structure has significant implications for model reusability, multi-task learning, model merging, and the development of training and inference-efficient algorithms, potentially reducing the carbon footprint of large-scale neural models.

large language model, machine learning, mistral-7b-instruct-v0, (18 more...)

arXiv.org Artificial Intelligence

2512.05117

Country:

North America > United States (0.45)
Asia > Middle East (0.27)

Genre: Research Report > New Finding (1.00)

Industry: Health & Medicine (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

The Path Not Taken: RLVR Provably Learns Off the Principals

Zhu, Hanqing, Zhang, Zhenyu, Huang, Hanxian, Su, DiJia, Liu, Zechun, Zhao, Jiawei, Fedorov, Igor, Pirsiavash, Hamed, Sha, Zhizhou, Lee, Jinwon, Pan, David Z., Wang, Zhangyang, Tian, Yuandong, Tai, Kai Sheng

arXiv.org Artificial IntelligenceNov-12-2025

Reinforcement Learning with Verifiable Rewards (RLVR) reliably improves the reasoning performance of large language models, yet it appears to modify only a small fraction of parameters. We revisit this paradox and show that sparsity is a surface artifact of a model-conditioned optimization bias: for a fixed pretrained model, updates consistently localize to preferred parameter regions, highly consistent across runs and largely invariant to datasets and RL recipes. We mechanistically explain these dynamics with a Three-Gate Theory: Gate I (KL Anchor) imposes a KL-constrained update; Gate II (Model Geometry) steers the step off principal directions into low-curvature, spectrum-preserving subspaces; and Gate III (Precision) hides micro-updates in non-preferred regions, making the off-principal bias appear as sparsity. We then validate this theory and, for the first time, provide a parameter-level characterization of RLVR's learning dynamics: RLVR learns off principal directions in weight space, achieving gains via minimal spectral drift, reduced principal-subspace rotation, and off-principal update alignment. In contrast, SFT targets principal weights, distorts the spectrum, and even lags RLVR. Together, these results provide the first parameter-space account of RLVR's training dynamics, revealing clear regularities in how parameters evolve. Crucially, we show that RL operates in a distinct optimization regime from SFT, so directly adapting SFT-era parameter-efficient fine-tuning (PEFT) methods can be flawed, as evidenced by our case studies on advanced sparse fine-tuning and LoRA variants. We hope this work charts a path toward a white-box understanding of RLVR and the design of geometry-aware, RLVR-native learning algorithms, rather than repurposed SFT-era heuristics.

arxiv preprint arxiv, large language model, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2511.08567

Country:

North America > United States > Texas > Travis County > Austin (0.04)
North America > United States > Massachusetts (0.04)
Asia > China > Tianjin Province > Tianjin (0.04)

Genre: Research Report > New Finding (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.95)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)

Add feedback

Silent Hazards of Token Reduction in Vision-Language Models: The Hidden Impact on Consistency

Sun, Yizheng, Li, Hao, Xu, Chang, Lin, Chenghua, Batista-Navarro, Riza, Sun, Jingyuan

arXiv.org Artificial IntelligenceMar-11-2025

Vision language models (VLMs) have excelled in visual reasoning but often incur high computational costs. One key reason is the redundancy of visual tokens. Although recent token reduction methods claim to achieve minimal performance loss, our extensive experiments reveal that token reduction can substantially alter a model's output distribution, leading to changes in prediction patterns that standard metrics such as accuracy loss do not fully capture. Such inconsistencies are especially concerning for practical applications where system stability is critical. To investigate this phenomenon, we analyze how token reduction influences the energy distribution of a VLM's internal representations using a lower-rank approximation via Singular Value Decomposition (SVD). Our results show that changes in the Inverse Participation Ratio of the singular value spectrum are strongly correlated with the model's consistency after token reduction. Based on these insights, we propose LoFi--a training-free visual token reduction method that utilizes the leverage score from SVD for token pruning. Experimental evaluations demonstrate that LoFi not only reduces computational costs with minimal performance degradation but also significantly outperforms state-of-the-art methods in terms of output consistency.

consistency, reduction, token reduction, (16 more...)

arXiv.org Artificial Intelligence

2503.06794

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)

Add feedback

Pre-training of Recurrent Neural Networks via Linear Autoencoders

Luca Pasa, Alessandro Sperduti

Neural Information Processing SystemsFeb-10-2025, 00:10:08 GMT

We propose a pre-training technique for recurrent neural networks based on linear autoencoder networks for sequences, i.e. linear dynamical systems modelling the target sequences. We start by giving a closed form solution for the definition of the optimal weights of a linear autoencoder given a training set of sequences. This solution, however, is computationally very demanding, so we suggest a procedure to get an approximate solution for a given number of hidden units. The weights obtained for the linear autoencoder are then used as initial weights for the inputto-hidden connections of a recurrent neural network, which is then trained on the desired task. Using four well known datasets of sequences of polyphonic music, we show that the proposed pre-training approach is highly effective, since it allows to largely improve the state of the art results on all the considered datasets.

artificial intelligence, machine learning, matrix, (17 more...)

Neural Information Processing Systems

Country:

North America > United States > New York (0.04)
North America > United States > California > Santa Clara County > Palo Alto (0.04)
Europe > Italy (0.04)

Industry:

Media > Music (0.34)
Leisure & Entertainment (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Principal Curvatures Estimation with Applications to Single Cell Data

Zhang, Yanlei, Mezrag, Lydia, Sun, Xingzhi, Xu, Charles, Macdonald, Kincaid, Bhaskar, Dhananjay, Krishnaswamy, Smita, Wolf, Guy, Rieck, Bastian

arXiv.org Artificial IntelligenceFeb-5-2025

The rapidly growing field of single-cell transcriptomic sequencing (scRNAseq) presents challenges for data analysis due to its massive datasets. A common method in manifold learning consists in hypothesizing that datasets lie on a lower dimensional manifold. This allows to study the geometry of point clouds by extracting meaningful descriptors like curvature. In this work, we will present Adaptive Local PCA (AdaL-PCA), a data-driven method for accurately estimating various notions of intrinsic curvature on data manifolds, in particular principal curvatures for surfaces. The model relies on local PCA to estimate the tangent spaces. The evaluation of AdaL-PCA on sampled surfaces shows state-of-the-art results. Combined with a PHATE embedding, the model applied to single-cell RNA sequencing data allows us to identify key variations in the cellular differentiation.

artificial intelligence, curvature, machine learning, (14 more...)

arXiv.org Artificial Intelligence

2502.0375

Country:

North America > Canada > Quebec > Montreal (0.05)
Europe > Switzerland > Fribourg > Fribourg (0.05)
North America > United States > New York > Nassau County > Mineola (0.04)
(3 more...)

Genre: Research Report (0.64)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Rethinking Decoders for Transformer-based Semantic Segmentation: A Compression Perspective

Wen, Qishuai, Li, Chun-Guang

arXiv.org Artificial IntelligenceJan-14-2025

State-of-the-art methods for Transformer-based semantic segmentation typically adopt Transformer decoders that are used to extract additional embeddings from image embeddings via cross-attention, refine either or both types of embeddings via self-attention, and project image embeddings onto the additional embeddings via dot-product. Despite their remarkable success, these empirical designs still lack theoretical justifications or interpretations, thus hindering potentially principled improvements. In this paper, we argue that there are fundamental connections between semantic segmentation and compression, especially between the Transformer decoders and Principal Component Analysis (PCA). From such a perspective, we derive a white-box, fully attentional DEcoder for PrIncipled semantiC segemenTation (DEPICT), with the interpretations as follows: 1) the self-attention operator refines image embeddings to construct an ideal principal subspace that aligns with the supervision and retains most information; 2) the cross-attention operator seeks to find a low-rank approximation of the refined image embeddings, which is expected to be a set of orthonormal bases of the principal subspace and corresponds to the predefined classes; 3) the dot-product operation yields compact representation for image embeddings as segmentation masks. Experiments conducted on dataset ADE20K find that DEPICT consistently outperforms its black-box counterpart, Segmenter, and it is light weight and more robust.

principal subspace, segmentation, subspace, (15 more...)

arXiv.org Artificial Intelligence

2411.03033

Country: Asia > China > Beijing > Beijing (0.04)

Genre: Research Report (0.84)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

TL-PCA: Transfer Learning of Principal Component Analysis

Hendy, Sharon, Dar, Yehuda

arXiv.org Machine LearningOct-14-2024

Principal component analysis (PCA) can be significantly limited when there is too few examples of the target data of interest. We propose a transfer learning approach to PCA (TL-PCA) where knowledge from a related source task is used in addition to the scarce data of a target task. Our TL-PCA has two versions, one that uses a pretrained PCA solution of the source task, and another that uses the source data. Our proposed approach extends the PCA optimization objective with a penalty on the proximity of the target subspace and the source subspace as given by the pretrained source model or the source data. This optimization is solved by eigendecomposition for which the number of data-dependent eigenvectors (i.e., principal directions of TL-PCA) is not limited to the number of target data examples, which is a root cause that limits the standard PCA performance. Accordingly, our results for image datasets show that the representation of test data is improved by TL-PCA for dimensionality reduction where the learned subspace dimension is lower or higher than the number of target data examples.

principal direction, subspace, tl-pca, (15 more...)

arXiv.org Machine Learning

2410.10805

Country: Asia > Middle East > Syria > Daraa Governorate > Dar'a (0.04)

Genre: Research Report > New Finding (0.66)

Technology: